Evaluation of Different Approaches to Training a Genre Classifier

نویسندگان

  • Vedrana Vidulin
  • Mitja Lustrek
  • Matjaz Gams
چکیده

This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, three machine learning algorithms, one for induction of decision trees (J48) and two ensemble algorithms (bagging and boosting), were trained and tested on the data set. Additionally, impact of feature selection on ensemble algorithms was tested. The best performed genre classifiers in terms of precision were selected to obtain the best of set of classifiers. On average the best of set achieved 9% better precision, but slightly worse recall. Accuracy and F-measure did not vary significantly. The results indicate that classification by genre could be a useful addition to search engines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-genre training for automatic prosody classification

We consider methods for training a prosodic classifier using labeled training data from a different genre than the one on which the system will be deployed. Two binary tasks are considered: word-level pitch accent and phrase boundary detection. Using radio news and conversational telephone speech, we consider cross-genre training using acoustic and textual features, and find that acoustic featu...

متن کامل

Automatic music genre classification using second-order statistical measures for the prescriptive approach

Several works proposed for the automatic genre musical classification are based on various combinations of parameters, exploiting different models. However, the comparison of all previous works remain impossible since they used different target taxonomies, genre definitions and databases. In this paper, the world largest music database (Real World Computing) is used. Also, different measures re...

متن کامل

ماشین بینایی تشخیص‌گر باروری تخم‌مرغ و ارزیابی کارایی شبکه‌های عصبی و ماشین بردار پشتیبان در آن

In this research, a system is proposed for detecting fertility of eggs. The system is composed of two parts: hardware and software. The fabricated hardware provides a platform to obtain accurate images from inner side of the eggs, without harming their embryos. The software part includes a set of image processing and machine vision processes, which is able to detect the fertility of eggs from c...

متن کامل

Comparison of the Performance of Genre Classifiers Trained by Different Machine Learning Algorithms

Modern search engines aim at classifying web pages not only according to topics, but also according to genres. This paper presents the results of an attempt to train a genre classifier. We present features extracted from a 20-genre corpus used for training the genre classifiers and the results of using different machine learning (ML) algorithms in the process of learning. Success of the genre c...

متن کامل

Evaluation of Jamendo Database as Training Set for Automatic Genre Recognition

Research on automatic music classification has gained significance in the recent years due to a significant increase in music collections size. Music is available very easily through the mobile and internet domain, so there is a need to manage music by categorizing it for search and discovery. This paper focuses on music classification by genre which is a type of supervised learning oriented pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007